Quasiperiodic Biosequences and Modulo Incidence Matrices
نویسندگان
چکیده
Algorithm development for finding quasiperiodic regions in sequences is at the core of many problems arising in biological sequence analysis. We solve an important problem in this area. Let A be an alphabet of size n and A’ denote the set of sequences of length 1 over A. Given a sequence S = ~1.52 . . .sl E A’, a positive integer p is called a period of S if s; = s;+~ for 1 5 i 5 1 p. S is called p-periodic if it has a minimum period p. Let n,(p) denote the set of p-periodic sequences in A I. A natural measure of “nearness to p-periodicity” for S is the average Hamming distance to the nearest p-periodic sequence: D(S) = minTEal(plD(S,T). If T is a sequence E n,(p) such that D(S,T) = D(S), then T is called a nearest p-periodic sequence of S and S is called pquasiperiodic associated with the score D(S). This paper develops an efficient algorithm for finding a nearest p-periodic sequence of S by means of its modulop incidence matrix. Let c\/ = (crr;..,c\/,) and /? = (q+ l;..,q+l 4 , ” ,>,>.$ where 1 = CV~ + CV~ + . . . + CV, is a partition of 1 and 4 is the quotientPaLd r is the remainder when 1 is divided by p. This paper shows that there exists a sequence in A’ whose modulo-p incidence matrix has row sum vector c\/ and column sum vector 0.
منابع مشابه
Unsupervised Pattern Discovery in Biosequences Using Aligned Pattern Clustering
Protein, RNA and DNA are made up of sequences of amino acids/nucleotides, which interact among themselves via binding. For example, (1) protein-DNA binding regulates gene transcription [1]; and (2) Protein-protein binding plays important roles in cell cycle control and signal transduction [2].The binding is maintained by either the direct participation or assistance of conserved short segments ...
متن کاملNeweyes: A System for Comparing Biological Sequences Using the Running Karp-Rabin Greedy String-Tiling Algorithm1
A system for aligning nucleotide or amino acid biosequences is described. The system, called Neweyes, employs a novel string matching algorithm, Running KarpRabin Greedy String Tiling (RKR-GST), which involves tiling one string with matching substrings of a second string. In practice, RKR-GST has a computational complexity that appears close to linear. With RKR-GST, Neweyes is able to detect tr...
متن کاملRunning Karp-Rabin Matching and Greedy String Tiling
A system for aligning nucleotide or amino acid biosequences is described. The system, called Neweyes, employs a novel string matching algorithm, Running Karp-Rabin Greedy String Tiling (RKR-GST), which involves tiling one string with matching substrings of a second string. In practice, RKR-GST has a computational complexity that appears close to linear. With RKR-GST, Neweyes is able to detect t...
متن کاملNeweyes: A System for Comparing Biological Sequences Using the Running Karp-Rabin Greedy String-Tiling Algorithm
A system for aligning nucleotide or amino acid biosequences is described. The system, called Neweyes, employs a novel string matching algorithm. Running Karp-Rabin Greedy String Tiling (RKR-GST), which involves tiling one string with matching substrings of a second string. In practice, RKR-GST has a computational complexity that appears close to linear. With RKR-GST, Neweyes is able to detect t...
متن کاملOn Hadamard Modulo Prime p Matrices of Size at most 2 p + 1 1
In this note, we continue the study of Hadamard Modulo Prime (HMP) matrices initialized in recent articles [5] – [6]. Namely, we have present some new non-existence and classification results for HMP matrices whose size is relatively small with respect to the modulo.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002